Policies produced by deep reinforcement learning are typically characterised by their learning curves, but they remain poorly understood in many other respects. ReLU-based policies result in a partitioning of the input space into piecewise linear regions. We seek to understand how observed region counts and their densities evolve during deep reinforcement learning using empirical results that span a range of continuous control tasks and policy network dimensions. Intuitively, we may expect that during training, the region density increases in the areas that are frequently visited by the policy, thereby affording fine-grained control. We use recent theoretical and empirical results for the linear regions induced by neural networks in supervised learning settings for grounding and comparison of our results. Empirically, we find that the region density increases only moderately throughout training, as measured along fixed trajectories coming from the final policy. However, the trajectories themselves also increase in length during training, and thus the region densities decrease as seen from the perspective of the current trajectory. Our findings suggest that the complexity of deep reinforcement learning policies does not principally emerge from a significant growth in the complexity of functions observed on-and-around trajectories of the policy.
translated by 谷歌翻译
In a sequential decision-making problem, having a structural dependency amongst the reward distributions associated with the arms makes it challenging to identify a subset of alternatives that guarantees the optimal collective outcome. Thus, besides individual actions' reward, learning the causal relations is essential to improve the decision-making strategy. To solve the two-fold learning problem described above, we develop the 'combinatorial semi-bandit framework with causally related rewards', where we model the causal relations by a directed graph in a stationary structural equation model. The nodal observation in the graph signal comprises the corresponding base arm's instantaneous reward and an additional term resulting from the causal influences of other base arms' rewards. The objective is to maximize the long-term average payoff, which is a linear function of the base arms' rewards and depends strongly on the network topology. To achieve this objective, we propose a policy that determines the causal relations by learning the network's topology and simultaneously exploits this knowledge to optimize the decision-making process. We establish a sublinear regret bound for the proposed algorithm. Numerical experiments using synthetic and real-world datasets demonstrate the superior performance of our proposed method compared to several benchmarks.
translated by 谷歌翻译
Multi-document summarization (MDS) has traditionally been studied assuming a set of ground-truth topic-related input documents is provided. In practice, the input document set is unlikely to be available a priori and would need to be retrieved based on an information need, a setting we call open-domain MDS. We experiment with current state-of-the-art retrieval and summarization models on several popular MDS datasets extended to the open-domain setting. We find that existing summarizers suffer large reductions in performance when applied as-is to this more realistic task, though training summarizers with retrieved inputs can reduce their sensitivity retrieval errors. To further probe these findings, we conduct perturbation experiments on summarizer inputs to study the impact of different types of document retrieval errors. Based on our results, we provide practical guidelines to help facilitate a shift to open-domain MDS. We release our code and experimental results alongside all data or model artifacts created during our investigation.
translated by 谷歌翻译
引入逻辑混淆是针对集成电路(IC)的多个硬件威胁的关键防御,包括反向工程(RE)和知识产权(IP)盗窃。逻辑混淆的有效性受到最近引入的布尔满意度(SAT)攻击及其变体的挑战。还提出了大量对策,以挫败SAT袭击。不论针对SAT攻击的实施防御,大型权力,性能和领域的开销是必不可少的。相比之下,我们提出了一种认知解决方案:基于神经网络的UNSAT子句翻译器Satconda,它会造成最小的区域和开销,同时以无法穿透的安全性保留原始功能。 SATCONDA与UNSAT子句生成器一起孵育,该生成器通过最小的扰动(例如包含一对逆变器或缓冲液)转换现有的结合性正常形式(CNF),或者根据提供的CNF添加新的轻巧UNSAT块。为了有效的Unsat子句生成,Satconda配备了多层神经网络,该网络首先了解特征(文字和条款)的依赖性,然后是一个长期 - 长期内存(LSTM)网络,以验证和回溯SAT-硬度,以更好地学习和翻译。我们拟议的Satconda在ISCAS85和ISCAS89基准上进行了评估,并被认为可以防御为硬件RE设计的多个最先进的SAT攻击。此外,我们还评估了针对Minisat,Lingeling和葡萄糖SAT求解器的拟议SATCONDAS经验性能,这些溶剂构成了许多现有的Deobfuscation SAT攻击。
translated by 谷歌翻译
产品捆绑是在线零售中使用的一种常见销售机制。为了设定有利可图的捆绑价格,卖方需要从交易数据中学习消费者的偏好。当客户购买捆绑包或多种产品时,不能使用经典方法(例如离散选择模型)来估计客户的估值。在本文中,我们提出了一种使用捆绑销售数据来了解消费者对产品的估值的方法。该方法将其降低为估计问题,其中样品由多面体区域审查。使用EM算法和蒙特卡洛模拟,我们的方法可以收回消费者估值的分布。该框架允许未观察到的无购买和集群市场细分。我们提供有关概率模型的可识别性和EM算法的收敛性的理论结果。该方法的性能也被数值证明。
translated by 谷歌翻译
大型神经模型的培训和推断很昂贵。但是,对于许多应用程序域,虽然新任务和模型经常出现,但建模的基础文档主要保持不变。我们研究如何通过嵌入回收利用(ER)来降低此类设置的计算成本:在执行训练或推理时从以前的模型中重新使用激活。与以前的工作相反,重点是冻结小型分类头进行填充,这通常会导致绩效显着下降,我们提出了从预告片的模型中缓存中间层的输出,并为新任务的剩余层进行填充。我们表明,我们的方法在训练过程中提供了100%的速度和55-86%的推理,并且对科学领域中文本分类和实体识别任务的准确性产生了可观的影响。对于通用域的问答任务,ER提供了类似的加速和少量准确性。最后,我们确定了ER的几个开放挑战和未来的方向。
translated by 谷歌翻译
我们提出了具有可拖动的对数密度的集合数据值数据的新型,有条件的生成概率模型。该模型是由置换模化动力学控制的连续归一化流。这些动力学是由可学习的每集元素项和成对相互作用的驱动的,均通过深神经网络参数化。我们通过应用程序说明了该模型的实用性,包括(1)以视觉上指定的地图信息为条件的复杂交通场景生成,以及(2)直接在图像上调节的对象边界框生成。我们借助罚款,可确保动力学平稳并因此有效解决,我们通过最大程度地提高标记有条件数据标记的条件数据的预期可能性来训练我们的模型。我们的方法在对数的可能性和特定于域特异性指标(越野,碰撞和违规违规)方面极大地超过了非渗透不变基线,从而产生了很难与真实数据区分的现实样本。
translated by 谷歌翻译
远程变压器模型取得了令人鼓舞的令人鼓舞的结果,即长上下文问题应答(QA)任务。这些任务通常需要超过一个长文件的推理,并且他们受益于识别一组证据跨度(例如,句子),为解决问题提供支持证据。在这项工作中,我们提出了一种用于装备远程变压器的新方法,其具有额外的序列级目标,以便更好地识别支持证据跨度。我们通过提出FineTuning的额外对比监督信号来实现这一目标,鼓励模型通过最大化问题证据相似性来明确歧视来自消极的证据句。拟议的额外损失表现出三种不同强大的长情绪变压器模型的一致改进,跨两个具有挑战性的问题回答基准 - 热杆菌和Qasper。
translated by 谷歌翻译
黑匣子优化需要指定搜索空间以探索解决方案,例如解决方案。 D维紧凑空间,此选择对于以合理的预算获得最佳结果至关重要。不幸的是,在许多应用中确定高质量的搜索空间可能具有挑战性。例如,当在给出有限的预算时调整机器学习管道的机器学习管道时,必须在不包括潜在有前途的地区之间进行平衡,并将搜索空间保持足够小以易于发动。这项工作的目标是激励 - 通过调整深度神经网络的示例应用程序 - 预测预算条件的搜索空间质量的问题,以及提供基于应用于a的实用程序功能的简单评分方法概率响应表面模型,类似于贝叶斯优化。我们表明我们所呈现的方法可以在各种情况下计算有意义的预算条件分数。我们还提供实验证据,即精确的分数可用于构建和修剪搜索空间。最终,我们认为评分搜索空间应该成为深度学习实验工作流程中的标准实践。
translated by 谷歌翻译
我们介绍了用于科学索赔核查的龙头克切者系统。鉴于科学索赔和含证据的研究摘要,Longchecker预测了一种可靠的标签,并根据索赔和摘要的共享编码,以多任务方式识别支持的基本原理。我们在SCIFact DataSet上执行实验,并发现Longchecker实现了最先进的性能。我们进行分析以了解这种改进的来源,并发现识别声明与报告科学发现之间的关系往往需要了解出现理由的背景。通过根据所有可用上下文进行标记决策,Longchecker在需要这种类型理解的情况下实现更好的性能。此外,我们表明LongChecker能够利用弱域内数据来利用弱势域数据,以方便为科学索赔核查的少量域适应。
translated by 谷歌翻译